Web Information Retrieval and Data Mining - Assignment 3

Student: SMIT, GIJS (0905883)

Answer 1.1 (4p)

Loading model from file
Success!
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_1 (Reshape)          (None, 3072)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 1152)              3540096   
_________________________________________________________________
dense_3 (Dense)              (None, 576)               664128    
_________________________________________________________________
dense_4 (Dense)              (None, 288)               166176    
_________________________________________________________________
dense_5 (Dense)              (None, 144)               41616     
_________________________________________________________________
dense_6 (Dense)              (None, 10)                1450      
=================================================================
Total params: 4,413,466
Trainable params: 4,413,466
Non-trainable params: 0
_________________________________________________________________
None
         loss  accuracy  val_loss  val_accuracy
min  0.380644  0.384159  0.557180      0.527979
max  1.839650  0.886520  1.477031      0.845093

Our network consists of 4 layers where each subsequent layer has half as many nodes: 1152-576-288-144.
We experimented a lot to find an optimal design: a too narrow layer will drop info and a too wide 
layer will overfit. The network was able to achieve a decent validation accuracy of 84.5% without
overfitting too much. We used Adam as optimizer as it performed better and converged the fastest.
Adam worked best in combination with a learning rate of 1.25e-5, a batch size of 12, and 40 epochs.

Answer 1.2 (2p)

Loading model from file
Success!
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_2 (Reshape)          (None, 1024)              0         
_________________________________________________________________
dense_7 (Dense)              (None, 1152)              1180800   
_________________________________________________________________
dense_8 (Dense)              (None, 576)               664128    
_________________________________________________________________
dense_9 (Dense)              (None, 288)               166176    
_________________________________________________________________
dense_10 (Dense)             (None, 144)               41616     
_________________________________________________________________
dense_11 (Dense)             (None, 10)                1450      
=================================================================
Total params: 2,054,170
Trainable params: 2,054,170
Non-trainable params: 0
_________________________________________________________________
None
         loss  accuracy  val_loss  val_accuracy
min  0.211858  0.480879  0.499776      0.633474
max  1.600264  0.940136  1.190097      0.856864

We converted the images to grayscale and increased the contrast to make the digits more distinguishable.
Applying standardization (zero mean and unit variance) resulted in significantly more overfitting so we
left it out. The same model from 1.1 now achieved a validation accuracy of 85.7% which is 1.2% better.
The overfitting is now much larger compared to 1.1. We conclude that dense networks cannot distinguish
different color channels so they can learn better features from grayscale images.

Answer 1.3 (4p)

Loading model from file
Success!
Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_5 (Reshape)          (None, 1024)              0         
_________________________________________________________________
block1_dense (Dense)         (None, 1152)              1180800   
_________________________________________________________________
block1_batchnorm (BatchNorma (None, 1152)              4608      
_________________________________________________________________
block1_dropout (Dropout)     (None, 1152)              0         
_________________________________________________________________
block2_dense (Dense)         (None, 576)               664128    
_________________________________________________________________
block2_batchnorm (BatchNorma (None, 576)               2304      
_________________________________________________________________
block2_dropout (Dropout)     (None, 576)               0         
_________________________________________________________________
block3_dense (Dense)         (None, 288)               166176    
_________________________________________________________________
block3_batchnorm (BatchNorma (None, 288)               1152      
_________________________________________________________________
block3_dropout (Dropout)     (None, 288)               0         
_________________________________________________________________
block4_dense (Dense)         (None, 144)               41616     
_________________________________________________________________
block4_batchnorm (BatchNorma (None, 144)               576       
_________________________________________________________________
block4_dropout (Dropout)     (None, 144)               0         
_________________________________________________________________
block5_fc (Dense)            (None, 10)                1450      
=================================================================
Total params: 2,062,810
Trainable params: 2,058,490
Non-trainable params: 4,320
_________________________________________________________________
None
         loss  accuracy  val_loss  val_accuracy
min  0.347027  0.602071  0.377585      0.623655
max  1.235725  0.888093  1.173410      0.888777

We regularized the model by adding batchnorm and dropout after each dense layer. L1 and L2 regularization 
gave no noticable improvements in combination with batchnorm and dropout so we left L1 and L2 out. The 
regularized model achieved a validation accuracy of 88.7%, which is a 3.0% improvement over 1.2. Also,
the model is not overfitting anymore. Regularization has decreased the gap between validation and 
train accuracy and it has improved the overall performance of the dense network.

Answer 2.1 (7p)

Loading model from file
Success!
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv (Conv2D)         (None, 32, 32, 64)        640       
_________________________________________________________________
block1_batchnorm (BatchNorma (None, 32, 32, 64)        256       
_________________________________________________________________
block1_dropout (Dropout)     (None, 32, 32, 64)        0         
_________________________________________________________________
block2_conv (Conv2D)         (None, 32, 32, 128)       73856     
_________________________________________________________________
block2_batchnorm (BatchNorma (None, 32, 32, 128)       512       
_________________________________________________________________
block2_dropout (Dropout)     (None, 32, 32, 128)       0         
_________________________________________________________________
block3_conv (Conv2D)         (None, 32, 32, 128)       147584    
_________________________________________________________________
block3_batchnorm (BatchNorma (None, 32, 32, 128)       512       
_________________________________________________________________
block3_dropout (Dropout)     (None, 32, 32, 128)       0         
_________________________________________________________________
block4_conv (Conv2D)         (None, 32, 32, 128)       147584    
_________________________________________________________________
block4_batchnorm (BatchNorma (None, 32, 32, 128)       512       
_________________________________________________________________
block4_pooling (MaxPooling2D (None, 16, 16, 128)       0         
_________________________________________________________________
block4_dropout (Dropout)     (None, 16, 16, 128)       0         
_________________________________________________________________
block5_conv (Conv2D)         (None, 16, 16, 128)       147584    
_________________________________________________________________
block5_batchnorm (BatchNorma (None, 16, 16, 128)       512       
_________________________________________________________________
block5_dropout (Dropout)     (None, 16, 16, 128)       0         
_________________________________________________________________
block6_conv (Conv2D)         (None, 16, 16, 128)       147584    
_________________________________________________________________
block6_batchnorm (BatchNorma (None, 16, 16, 128)       512       
_________________________________________________________________
block6_dropout (Dropout)     (None, 16, 16, 128)       0         
_________________________________________________________________
block7_conv (Conv2D)         (None, 16, 16, 256)       295168    
_________________________________________________________________
block7_batchnorm (BatchNorma (None, 16, 16, 256)       1024      
_________________________________________________________________
block7_pooling (MaxPooling2D (None, 8, 8, 256)         0         
_________________________________________________________________
block7_dropout (Dropout)     (None, 8, 8, 256)         0         
_________________________________________________________________
block8_conv (Conv2D)         (None, 8, 8, 256)         590080    
_________________________________________________________________
block8_batchnorm (BatchNorma (None, 8, 8, 256)         1024      
_________________________________________________________________
block8_dropout (Dropout)     (None, 8, 8, 256)         0         
_________________________________________________________________
block9_conv (Conv2D)         (None, 8, 8, 256)         590080    
_________________________________________________________________
block9_batchnorm (BatchNorma (None, 8, 8, 256)         1024      
_________________________________________________________________
block9_pooling (MaxPooling2D (None, 4, 4, 256)         0         
_________________________________________________________________
block9_dropout (Dropout)     (None, 4, 4, 256)         0         
_________________________________________________________________
block10_conv (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block10_batchnorm (BatchNorm (None, 4, 4, 512)         2048      
_________________________________________________________________
block10_dropout (Dropout)    (None, 4, 4, 512)         0         
_________________________________________________________________
block11_conv (Conv2D)        (None, 4, 4, 2048)        1050624   
_________________________________________________________________
block11_dropout (Dropout)    (None, 4, 4, 2048)        0         
_________________________________________________________________
block12_conv (Conv2D)        (None, 4, 4, 256)         524544    
_________________________________________________________________
block12_pooling (MaxPooling2 (None, 2, 2, 256)         0         
_________________________________________________________________
block12_dropout (Dropout)    (None, 2, 2, 256)         0         
_________________________________________________________________
block13_conv (Conv2D)        (None, 2, 2, 256)         590080    
_________________________________________________________________
block13_pooling (MaxPooling2 (None, 1, 1, 256)         0         
_________________________________________________________________
block13_dropout (Dropout)    (None, 1, 1, 256)         0         
_________________________________________________________________
block14_flatten (Flatten)    (None, 256)               0         
_________________________________________________________________
block14_fc (Dense)           (None, 10)                2570      
=================================================================
Total params: 5,496,074
Trainable params: 5,492,106
Non-trainable params: 3,968
_________________________________________________________________
None
         loss  accuracy  val_loss  val_accuracy
min  0.098163  0.173848  0.183210      0.202430
max  2.351455  0.970965  2.369631      0.956631

We implemented a network inspired by the SimpleNet architecture (HasanPour et al., 2016). SimpleNet 
is a relatively simple architecture that can outperform deeper and more complex architectures (e.g. on
CIFAR-10) and it has a good tradeoff between the computation efficiency and accuracy. SimpleNet consists
of typical building blocks that contains a conv, batchnorm, and dropout layer. We adjusted the ordering
in each block such that batchnorm was applied after each conv layer (instead of before). We trained the
model for 25 epochs using Adam as optimzer. The trained model is slightly overfitting. This is done on
purpose so that the model is still able to learn more by using data augmentation. The model achieved
a validation accuracy of 95.7% which is decent for such a small network.

Answer 2.2 (3p)

Loading model from file
Success!
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv (Conv2D)         (None, 32, 32, 64)        640       
_________________________________________________________________
block1_batchnorm (BatchNorma (None, 32, 32, 64)        256       
_________________________________________________________________
block1_dropout (Dropout)     (None, 32, 32, 64)        0         
_________________________________________________________________
block2_conv (Conv2D)         (None, 32, 32, 128)       73856     
_________________________________________________________________
block2_batchnorm (BatchNorma (None, 32, 32, 128)       512       
_________________________________________________________________
block2_dropout (Dropout)     (None, 32, 32, 128)       0         
_________________________________________________________________
block3_conv (Conv2D)         (None, 32, 32, 128)       147584    
_________________________________________________________________
block3_batchnorm (BatchNorma (None, 32, 32, 128)       512       
_________________________________________________________________
block3_dropout (Dropout)     (None, 32, 32, 128)       0         
_________________________________________________________________
block4_conv (Conv2D)         (None, 32, 32, 128)       147584    
_________________________________________________________________
block4_batchnorm (BatchNorma (None, 32, 32, 128)       512       
_________________________________________________________________
block4_pooling (MaxPooling2D (None, 16, 16, 128)       0         
_________________________________________________________________
block4_dropout (Dropout)     (None, 16, 16, 128)       0         
_________________________________________________________________
block5_conv (Conv2D)         (None, 16, 16, 128)       147584    
_________________________________________________________________
block5_batchnorm (BatchNorma (None, 16, 16, 128)       512       
_________________________________________________________________
block5_dropout (Dropout)     (None, 16, 16, 128)       0         
_________________________________________________________________
block6_conv (Conv2D)         (None, 16, 16, 128)       147584    
_________________________________________________________________
block6_batchnorm (BatchNorma (None, 16, 16, 128)       512       
_________________________________________________________________
block6_dropout (Dropout)     (None, 16, 16, 128)       0         
_________________________________________________________________
block7_conv (Conv2D)         (None, 16, 16, 256)       295168    
_________________________________________________________________
block7_batchnorm (BatchNorma (None, 16, 16, 256)       1024      
_________________________________________________________________
block7_pooling (MaxPooling2D (None, 8, 8, 256)         0         
_________________________________________________________________
block7_dropout (Dropout)     (None, 8, 8, 256)         0         
_________________________________________________________________
block8_conv (Conv2D)         (None, 8, 8, 256)         590080    
_________________________________________________________________
block8_batchnorm (BatchNorma (None, 8, 8, 256)         1024      
_________________________________________________________________
block8_dropout (Dropout)     (None, 8, 8, 256)         0         
_________________________________________________________________
block9_conv (Conv2D)         (None, 8, 8, 256)         590080    
_________________________________________________________________
block9_batchnorm (BatchNorma (None, 8, 8, 256)         1024      
_________________________________________________________________
block9_pooling (MaxPooling2D (None, 4, 4, 256)         0         
_________________________________________________________________
block9_dropout (Dropout)     (None, 4, 4, 256)         0         
_________________________________________________________________
block10_conv (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block10_batchnorm (BatchNorm (None, 4, 4, 512)         2048      
_________________________________________________________________
block10_dropout (Dropout)    (None, 4, 4, 512)         0         
_________________________________________________________________
block11_conv (Conv2D)        (None, 4, 4, 2048)        1050624   
_________________________________________________________________
block11_dropout (Dropout)    (None, 4, 4, 2048)        0         
_________________________________________________________________
block12_conv (Conv2D)        (None, 4, 4, 256)         524544    
_________________________________________________________________
block12_pooling (MaxPooling2 (None, 2, 2, 256)         0         
_________________________________________________________________
block12_dropout (Dropout)    (None, 2, 2, 256)         0         
_________________________________________________________________
block13_conv (Conv2D)        (None, 2, 2, 256)         590080    
_________________________________________________________________
block13_pooling (MaxPooling2 (None, 1, 1, 256)         0         
_________________________________________________________________
block13_dropout (Dropout)    (None, 1, 1, 256)         0         
_________________________________________________________________
block14_flatten (Flatten)    (None, 256)               0         
_________________________________________________________________
block14_fc (Dense)           (None, 10)                2570      
=================================================================
Total params: 5,496,074
Trainable params: 5,492,106
Non-trainable params: 3,968
_________________________________________________________________
None
         loss  accuracy  val_loss  val_accuracy
min  0.102976  0.187775  0.141312      0.317052
max  2.346148  0.969360  1.956509      0.965632

The augmentations we implemented are small rotations, heigh shifts, channel shifts, and shears. Larger
rotations and shifts would destroy too much information. We did not implement width shifts as some images
contain multiple digits located next to the center digit. As the images are grayscale we also implemented
a custom augmentation function that inverses the black and white values in images. Using the
augmentations we achieved a validation accuracy of 96.6% which is an 0.9% improvement.

Answer 3.1 (2p)

The accuracy of the model on the test data is 96.4%. It is possible to get a small accuracy boost
applying the same augmentations from training on the test set and averaging multiple predictions.
This way we increased the accuracy to 96.6% which is an 0.2% improvement. Note that this method
does not causes any data leakage because we only use the test images, not the labels. This is
quite a good accuracy compared to state-of-the-art benchmark that typically reach between
96% to 99% accuracy.

According to the confusion matrix the classes 1, 2 and 7 are often confused and also digits
3, 5, and 9. These digits are quite similar in appearence. We plotted misclassifications of
class 2. It appears that most of the errors are made on images that are unclear, noisy, or
have multiple digits.

Answer 3.2 (4p)

We plotted the activations of 8 convolutional layers located at different depths in the network. These
8 plots give an idea of the types of features that get extracted from the input image. In the first few
layers these features are quite understandable such as edges and shapes. But deeper down the network
the features become more abstract. Note that the model also has two conv layers with 1x1 filters.
These filters were not very interesting to show as they would be shown as single squares.

Answer 3.3 (4p)

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Answer 4.1 (5p)

Not-trainable: input_2
Not-trainable: block1_conv1
Not-trainable: block1_conv2
Not-trainable: block1_pool
Not-trainable: block2_conv1
Not-trainable: block2_conv2
Not-trainable: block2_pool
Trainable: block3_conv1
Trainable: block3_conv2
Trainable: block3_conv3
Trainable: block3_pool
Trainable: block4_conv1
Trainable: block4_conv2
Trainable: block4_conv3
Trainable: block4_pool
Trainable: block5_conv1
Trainable: block5_conv2
Trainable: block5_conv3
Trainable: block5_pool
Trainable: block6_flatten
Trainable: block6_dropout1
Trainable: block6_fc1
Trainable: block6_dropout2
Trainable: block6_fc2
Loading model from file
Success!
Model: "model_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
block6_flatten (Flatten)     (None, 512)               0         
_________________________________________________________________
block6_dropout1 (Dropout)    (None, 512)               0         
_________________________________________________________________
block6_fc1 (Dense)           (None, 128)               65664     
_________________________________________________________________
block6_dropout2 (Dropout)    (None, 128)               0         
_________________________________________________________________
block6_fc2 (Dense)           (None, 10)                1290      
=================================================================
Total params: 14,781,642
Trainable params: 14,521,482
Non-trainable params: 260,160
_________________________________________________________________
None
         loss  accuracy  val_loss  val_accuracy
min  0.182973  0.614126  0.245493      0.847297
max  1.180056  0.951278  0.502027      0.932398

Fully freezing the original conv base resulted in poor accuracy when retraining the model and its
embeddings would result in a low accuracy at 4.2. So, we had to finetune the base so that the
embeddings would be more useful for our data. We found that unfreezing more blocks would result in
better embeddings, but to be sure that we would not erase too much of the original embeddings,
we unfroze only the last three block of convolutions and re-trained these with a very very
small learning rate.

Answer 4.2 (5p)

Pipeline(memory=None,
         steps=[('classifier',
                 KNeighborsClassifier(algorithm='auto', leaf_size=30,
                                      metric='minkowski', metric_params=None,
                                      n_jobs=-1, n_neighbors=15, p=2,
                                      weights='uniform'))],
         verbose=False)
Accuracy on validation set: 0.9275
Accuracy on test set: 0.9268
Evaluation: 0.9267801389868063
Running time: 195.08 seconds
Last modified: April 20, 2020
scikit-learn version: 0.22.2.post1